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1 Introduction 



Principal component analysis (PCA) is a well-known method for extracting linear structures from 
high-dimensional datasets. It computes the subspace best approaching the dataset from the Eu- 
clidean point of view. This method benefits from efficient implementations based either on solving 
an eigenvalue problem or on iterative algorithms. We refer to [57] for details. In a similar fashion, 
multi-dimensional scaling [3J [351 El] addresses the problem of finding the linear subspace best pre- 
serving the pairwise distances. More recently, new algorithms have been proposed to compute low 
dimensional embeddings of high dimensional data. For instance, Isomap , LLE (Locally linear 
embedding) [42] and CDA (Curvilinear distance analysis) [9] aim at reproducing in the projection 
space the structure of the initial local neighborhood. These methods are mainly dedicated to visu- 
alization purposes. They cannot produce an analytic form of the transformation function, making 
it difficult to map new points into the dimensionality-reduced space. Besides, since they rely on 
local properties of pairwise distances, these methods are sensitive to noise and outliers. We refer 
to |38] for a comparison between Isomap and CDA and to |48) for a comparison between some 
features of LLE and Isomap. 

Finding nonlinear structures is a challenging problem. An important family of methods focuses 
on self-consistent structures. The self-consistency concept is precisely defined in [35]. Geometri- 
cally speaking, it means that each point of the structure is the mean of all points that project 
orthogonally onto it. For instance, it can be shown that the /c-means algorithm |23] converges to 
a set of k self-consistent points. Principal curves and surfaces jSJ [2H [37] [47] are examples of one- 
dimensional and two-dimensional self-consistent structures. Their practical computation requires 
to solve a nonlinear optimization problem. The solution is usually non robust and suffers from 
a high estimation bias. In [31], a polygonal algorithm is proposed to reduce this bias. Higher 
dimensional self-consistent structures are often referred to as self-consistent manifolds even though 
their existence is not guaranteed for arbitrary datasets. An estimation algorithm based on a grid 
approximation is proposed in |19j . The fitting criterion involves two smoothness penalty terms 
describing the elastic properties of the manifold. 

In this paper, auto-associative models are proposed as candidates to the generalization of PCA. 
We show in paragraph 12.11 that these models are dedicated to the approximation of the dataset 
by a manifold. Here, the word "manifold" refers to the topology properties of the structure [39] . 
The approximating manifold is built by a projection pursuit algorithm presented in paragraph s. 21 
At each step of the algorithm, the dimension of the manifold is incremented. Some theoretical 
properties are provided in paragraph 12.31 In particular, we can show that, at each step of the 
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algorithm, the mean residuals norm is not increased. Moreover, it is also established that the 
algorithm converges in a finite number of steps. Section [3] is devoted to the presentation of some 
particular auto-associative models. They are compared to the classical PCA and some neural 
networks models. Implementation aspects are discussed in Section 0] We show that, in numerous 
cases, no optimization procedure is required. Some illustrations on simulated and real data are 
presented in Section [5] 



2 Auto-associative models 

In this chapter, for each unit vector a s R p , we denote by P a (-) — (a, ■) the linear projection from 
R p to R. Besides, for all set E, the identity function E — > E is denoted by Id_e. 



2.1 Approximation by manifolds 

A function F d : M. p — > R p is a ci-dimensional auto- associative function if there exist d unit orthogonal 
vectors a k , called principal directions, and d continuously differentiable functions s k : R — > R p , 
called regression functions, such that 

P a3 o s k = 5 jlk ldw for all 1 < j < k < d, (1) 

where Sj t k is the Kronecker symbol and 

l 

F d = (ld R p - s d oP ad )o...o (Id„, - s 1 o P al ) = ]J (ld RP - s k o P ak ) . (2) 

k=d 

The main feature of auto-associative functions is mainly a consequence of ([T]): 

Theorem 1 The equation F (x) = 0, x € R p defines a differentiable d- dimensional manifold of 
W 3 . 

We refer to [16] for a proof. Thus, the equation F d (x) = defines a space in which every point has 
a neighborhood which resembles the Euclidean space R d , but in which the global structure may be 
more complicated. As an example, on a 1-dimensional manifold, every point has a neighborhood 
that resembles a line. In a 2-manifold, every point has a neighborhood that looks like a plane. 
Examples include the sphere or the surface of a torus. 

Now, let X be a square integrable random vector of R p . Assume, without loss of generality, 



that X is centered and introduce cr 2 (X) = E 
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. For all auto- associative function F d , let us 



consider e — F d (X). Note that, from the results of Subsection 12 .31 below, e is necessarily a centered 
random vector. In this context, cr 2 (e) is called the residual variance. Geometrically speaking, the 
realizations of the random vector X are approximated by the manifold F d {x) = 0, x € R p and 
<7 2 (e) represents the variance of X "outside" the manifold. 

Of course, such random vector X always satisfies a 0-dimensional auto-associative model with 
F° = Wrp and u 2 (e) = a 2 (X). Similarly, X always satisfies a p-dimensional auto- associative 
model with F p = and u 2 (e) = 0. In practice, it is important to find a balance between these 
two extreme cases by constructing a <i-dimensional model with d«p and <r 2 (e) <C cr 2 (X). For 
instance, in the case where the covariance matrix E of X is of rank d, then X is located on a 
e?-dimensional linear subspace defined by the equation Fp CA (x) = with 

d 

F d CA (x)=x-Y / Pa*(x)a k , (3) 
fc=i 
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and where a k , k = l,...,d are the eigenvectors of E associated to the positive eigenvalues. A 
little algebra shows that © can be rewritten as F d (x) = 0, where F d is a e?-dimensional auto- 
associative function with linear regression functions a k (t) = ta k for k = 1, . . . ,d. Moreover, we 
have <r 2 (e) = 0. Since ([3]) is the model produced by a PCA, it straightforwardly follows that PCA 
is a special (linear) case of auto-associative models. In the next section, we propose an algorithm 
to build auto-associative models with non necessarily linear regression functions, small dimension 
and small residual variance. Such models could also be called "semi-linear" or "semi-parametric" 
since they include a linear/parametric part through the use of linear projection operators and a 
non-linear/non-parametric part through the regression functions. 



2.2 A projection pursuit algorithm 

Let us recall that, given an unit vector o £ l p , an index /: R — >• R is a functional measuring 
the interest of the projection P a (X) with a non negative real number. The meaning of the word 
" interest" depends on the considered data analysis problem. For instance, a possible choice of I is 
the projected variance / o P a (.) = Var [P n (.)]- Some other examples are presented in Section l4~2l 
Thus, the maximization of I o P a (X) with respect to a yields the most interesting direction for 
this given criteria. An algorithm performing such an optimization is called a projection pursuit 
algorithm. We refer to [26 and [28 for a review on this topic. 

Let d € {0, . . . and consider the following algorithm which consists in applying iteratively 
the following steps: [A] computation of the Axes, [P] Projection, [R] Regression and [U] Update: 

Algorithm 1 Define R° = X . 
For k = 1,.. .,d: 

[A] Determine a k = argmax/ o P x (R k ^ 1 ) s.t. \\x\\ = 1, P a i(x) =0, 1 < j < k. 

[P] Compute Y k = P ak (R k ~ 1 ). 

[R] Estimate s k (t) = E [R k - 1 \Y k = t] , 

[U] Compute R k = R*' 1 - s k (Y k ). 

The random variables Y k are called principal variables and the random vectors R k residuals. Step 
[A] consists in computing an axis orthogonal to the previous ones and maximizing a given index 
/. Step [P] consists in projecting the residuals on this axis to determine the principal variables, 
and step [R] is devoted to the estimation of the regression function of the principal variables best 
approximating the residuals. Step [U] simply consists in updating the residuals. Thus, Algorithm[T] 
can be seen as a projection pursuit regression algorithm [TH [32] since it combines a projection 
pursuit step [A] and a regression step [R]. The main problem of such approaches is to define an 
efficient way to iterate from k to fc + 1. Here, the key property is that the residuals R k are 
orthogonal to the axis a k since 

P ak {R k ) = P ak (R k - 1 )-P ak os k (Y k ) 

= P ak (R k - 1 )-E[P ak (R k - 1 )\Y k ] 

= Y k -E[Y k \Y k ] 

= 0. (4) 

Thus, it is natural to iterate the model construction in the subspace orthogonal to a k , see the 
orthogonality constraint in step [A]. The theoretical results provided in the next paragraph are 
mainly consequences of this property. 
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2.3 Theoretical results 

Basing on Q , it is easily shown by induction that both the residuals and the regression functions 
computed at the iteration k are almost surely (a.s.) orthogonal to the axes computed before. More 
precisely, one has 

(a j ,R k ) = 0, a.s. for all 1 < j < k < d, (5) 
(a J ,s fe (T fe )) = 0, a.s. for all 1 < j < k < d. (6) 

Besides, the residuals, principal variables and regression functions are centered: 

E [R k ] = E [Y k ] = E [s k (Y k )] = 0, 

for all 1 < k < d. Our main result is the following: 

Theorem 2 Algorithm]]] builds a d- dimensional auto-associative model with principal directions 
{a 1 , . . . , a d }, regression functions {s 1 , . . . , s d } and residual e = R d . Moreover, one has the expan- 
sion 

d 

X = J2s k {Y k ) +R d , (7) 
fe=i 

where the principal variables Y k and Y k+1 are centered and non- correlated for k = 1, . . . , d — 1. 

The proof is a direct consequence of the orthogonality properties (O and ©. Let us highlight 
that, for d — p, expansion ([7]) yields an exact expansion of the random vector X as: 

k=l 

since R p = (a.s.) in view of ([5]). Finally, note that the approximation properties of the conditional 
expectation entails that the sequence of the residual norms is almost surely non increasing. As a 
consequence, the following corollary will prove useful to select the model dimension similarly to 
the PCA case. 

Corollary 1 Let Qd be the information ratio represented by the d-dimensional auto-associative 
model: 

Q d = l-a 2 (R d )/* 2 (X) . 
Then, Qq = 0, Q p = 1 and the sequence (Qd) is non decreasing. 

Note that all these properties are quite general, since they do not depend either on the index I, 
nor on the estimation method for the conditional expectation. In the next section, we show how, 
in particular cases, additional properties can be obtained. 

3 Examples 

We first focus on the auto-associative models which can be obtained using linear estimators of 
the regression functions. The existing links with PCA are highlighted. Second, we introduce 
the intermediate class of additive auto-associative models and compare it to some neural network 
approaches. 
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3.1 Linear auto-associative models and PC A 



Here, we limit ourselves to linear estimators of the conditional expectation in step [R]. At iteration 
k, we thus assume 

s k (t) = tb k , teM.,b k eW. 

Standard optimization arguments (see [T5], Proposition 2) shows that, necessarily, the regression 
function obtained at step [R] is located on the axis 

b k = ^ k - l a k /{ t a k ^ k - 1 a k ), (8) 

with the covariance matrix of 

Efc-i = E [R k ~ 1 t R k ~ 1 ] , (9) 

and where, for all matrix M, the transposed matrix is denoted by t M. As a consequence of 
Theorem [21 we have the following linear expansion: 

As an interesting additional property of these so-called linear auto-associative models, we have 
E [ijlfc] = for all 1 < j < k < d. This property is established in [18], Proposition 2. Therefore, 
the limitation to a family of linear functions in step [R] allows to recover an important property 
of PCA models: the non-correlation of the principal variables. It is now shown that Algorithm [T] 
can also compute a PCA model for a well suited choice of the index. 

Proposition 1 If the index in step [A] is the projected variance, i.e. 

IoP x (R k - 1 )=V & r[P x (R k - 1 )], 

and step [R] is given by (0) then Algorithm^ computes the PCA model of X . 

Indeed, the solution a k of step [A] is the eigenvector associated to the maximum eigenvalue of 
From ([8]) it follows that b k = a k . Replacing in (|2|), we obtain, for orthogonality reasons, 

Fd r?d 
- t PCA- 

3.2 Additive auto-associative models and neural networks 

A d-dimensional auto-associative function is called additive if ^ can be rewritten as 

d 

F d = Id Rd -^ S fc oP afc . (10) 
fc=l 

In [17j . the following characterization of additive auto-associative functions is provided. A d- 
dimensional auto-associative function is additive if and only if 

Pa) ° s k = 5j, fc Id R for all (j, k) e {1, . . . , d} 2 . 

As a consequence, we have: 

Theorem 3 In the linear subspace spanned by {a 1 ,...,a d }, every d-dimensional additive auto- 
associative model reduces to the PCA model. 
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A similar result can be established for the nonlinear PCA based on a neural network and introduced 
in [29] . The proposed model is obtained by introducing a nonlinear function g : K — > R, called 
activation function, in the PCA model © to obtain 

d 

F? {j (x)=x-J29° p a»(x)a k - (11) 

k=l 

Note that is an additive auto- associative model as defined in (fTCTj) if and only if g = Ida, *- e - if 
and only if it reduces to the PCA model in the linear subspace spanned by {a 1 , . . . , a d }. Moreover, 
in all cases, we have 

{F*j(x) = 0, x e W } C {Ffc A {x) = 0, x e 

which means that this model is included in the PCA one. More generally, the auto-associative 
Perceptron with one hidden layer [7] is based on multidimensional activation functions <j k : M — > MP: 

d 

FZ AP (x)=X-J2<7 k ° P a*(x)- (12) 
fc=l 

Unfortunately, it can be shown [TU] that a single hidden layer is not sufficient. Linear activation 
functions (leading to a PCA) already yield the best approximation of the data. In other words, the 
nonlinearity introduced in (TT2"|) has no significant effect on the final approximation of the dataset. 
Besides, determining a k , k = 1, . . . , d is a highly nonlinear problem with numerous local minima, 
and thus very dependent on the initialization. 

4 Implementation aspects 

In this section, we focus on the implementation aspects associated to Algorithm [1] Starting from 
a n-sample {X%, . . . , X n }, two problems are addressed. In Subsection 14. 1[ we propose some simple 
methods to estimate the regression functions s k appearing in step [R] . In Subsection 14.21 the 
choice of the index in step [A] is discussed. In particular, we propose a contiguity index whose 
maximization is explicit. 

4.1 Estimation of the regression functions 

4.1.1 Linear auto-associative models 

To estimate the regression functions, the simplest solution is to use a linear approach leading to a 
linear auto-associative model. In this case, the regression axis is explicit, see ((HJ, and it suffices to 
replace defined in (|9|) by its empirical counterpart 

n 

v k - 1 = ~Y J R k i - lt $-\ (13) 

n £ — ' 

where R^ 1 is the residual associated to Xi at iteration k — 1. 

4.1.2 Nonlinear auto-associative models 

Let us now focus on nonlinear estimators of the conditional expectation s k (t) = E = t\ , 

t 6 M. Let us highlight that s k is a univariate function and thus its estimation does not suffer from 
the curse of dimensionality 1 . This important property is a consequence of the " bottleneck" trick 
used in @ and, more generally, in neural networks approaches. The key point is that, even though 
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s k o P a k is a p- variate function, its construction only requires the nonparametric estimation of a 
univariate function thanks to the projection operator. 

For the sake of simplicity, we propose to work in the orthogonal basis B k of W obtained by 
completing {a 1 , . . . , a k }. Let us denote by -Rj -1 the j-th coordinate of R k ~ 1 in B k . In view of (JSJ) , 
R*' 1 = for j = 1, . . ., k - 1. Besides, from step [P], R^ 1 = Y k . Thus, the estimation of s k (t) 
reduces to the estimation of p — k functions 

s k (t) = E [R^IY* =t], j = k + 1, . . . ,p. 

This standard problem [22j [H] can be tackled either by kernel [2] or projection [20] estimates. 

Kernel estimates Each coordinate j 6 {k + 1, . . . ,p} of the estimator can be written in the 
basis B-i as: 

m =t R ^ lK (rP) /t K (^) . ( i4 ) 

where R^ 1 represents the j-th coordinate of the residual associated to the observation Xi at the 
(k — l)-th iteration in the basis B k , Y k is the value of the fc-th principal variable for the observation 
Xi and K is a Parzen-Rosenblatt kernel, that is to say a bounded real function, integrating to one 
and such that tK{t) — > as \t\ — > oo. For instance, one may use a a standard Gaussian density. 
The parameter h is a positive number called window in this context. In fact, s k (t) can be seen as 
a weighted mean of the residuals Rj^ 1 which are close to t: 

n 

where the weights are defined by 
and are summing to one: 

n 
i=l 

The amplitude of the smoothing is tuned by h. In the case of a kernel with bounded support, 
for instance if supp(X) = [—1,1], the smoothing is performed on an interval of length 2h. For an 
automatic choice of the smoothing parameter h, we refer to [25j . Chapter 6. 

Projection estimates Each coordinate j € {k + 1, . . . ,p} of the estimator is expanded on a 
basis of L real functions {bg(t), 1 = 1, ...,£} as: 

L 

s k (t) = Y, a iMt)- 

The coefficients a k e appearing in the linear combination of basis functions are determined such 
that s k (Y k ) ~ Rj^ 1 for i = 1, . . . , n. More precisely, 

n / L 

a 3,- i-i \e=i 
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and it is well-known that this least-square problem benefits from an explicit solution which can be 
matricially written as 

& k =( t B k B kylt B k R k-l ( 15 ) 

where B k is the n x L matrix with coefficients B k e = bi(Y k ), i = 1, . . .n, I = 1, . . . , L. Note that 
this matrix does not depend on the coordinate j. Thus, the matrix inversion in (jf 5[) is performed 
only once at each iteration k. Besides, the size of this matrix is L x L and thus does not depend 
either on the dimension of the space p, nor on the sample size n. As an example, one can use a 
basis of cubic splines [TTj . In this case, the parameter L is directly linked to N the number of 
knots: L = N + 4. Remark that, in this case, condition N + 4 < n is required so that the matrix 
is t B k B k is regular. 



4.2 Computation of principal directions 

The choice of the index / is the key point of any projection pursuit problem where it is needed 
to find "interesting" directions. We refer to [26] and [28] for a review on this topic. Let us recall 
that the meaning of the word "interesting" depends on the considered data analysis problem. As 
mentioned in Subsection 12.21 the most popular index is the projected variance 



n 

IpcaoP x (R k - 1 ) = -^P^Rt 1 ) (16) 

2 — 1 

used in PCA. Remarking that this index can be rewritten as 

ipca o p. (tf- 1 ) = ± y, ftxt 1 - 

»=1 j=£i 

it appears that the "optimal" axis maximizes the mean distance between the projected points. 
An attractive feature of the index (|16p is that its maximization benefits from an explicit solution 
in terms of the eigenvectors of the empirical covariance matrix Vk-i defined in (|13[) . Friedman 
et al [HI [13], and more recently Hall [21], proposed an index to find clusters or use deviation 
from the normality measures to reveal more complex structures of the scatter-plot. An alternative 
approach can be found in [?] where a particular metric is introduced in PCA so as to detect clusters. 
We can also mention indices dedicated to outliers detection [30]. Similar problems occur in the 
neural networks context where the focus is on the construction of nonlinear mappings to unfold the 
manifold. It is usually required that such a mapping preserves that local topology of the dataset. 
In this aim, Demartines and Herault [9] introduce an index to detect the directions in which the 
nonlinear projection approximatively preserves distances. Such an index can be adapted to our 
framework by restricting ourselves to linear projections: 

n 

Ho\P x \{R k - l -R^- x ). 

The function H is assumed to be positive and non increasing in order to favor the local topology 
preservation. According the authors, the application of this function to the outputs PrP^ -1 
instead of the inputs R%~ 1 allows to obtain better performances than the Kohonen's self-organizing 
maps (33] [34] . Similarly, the criterion introduced in [43] yields in our case 

n 

i s o PAR"- 1 ) = EE dl^" 1 - iPxKRt'-R-- 1 )) 2 

1=1 j=£i 

/ee^ -1 -^ 1 )- 

/ i=l fri 
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Figure 1: Left: axis a such that the associated projection P a preserves the first-order neighborhood 
structure. The regression function s correctly fits the dataset. Right: axis a for which P a does 
not preserve the first-order neighborhood structure. The regression function s cannot yield a good 
approximation of the dataset. 

However, in both cases, the resulting functions are nonlinear and thus difficult to optimize with 
respect to x. 

Our approach is similar to Lebart one's [36j . It consists in defining a contiguity coefficient whose 
minimization allows to unfold nonlinear structures. At each iteration k, the following Rayleigh 
quotient |41l is maximized with respect to x: 

n l n n 

iop x (R k -') = Y / p"(Rt 1 ) /EE^^" 1 -^ 1 )- ( 17 ) 

i=l / i=l j=l 

The matrix M k ~ 1 = (mfj 1 ) is a first order contiguity matrix, whose value is 1 when R k 1 is 

the nearest neig hbor of i^" 1 , otherwise. The upper part of (p~7j) is proportional to the usual 
projected variance, sec (fT6|) . The lower part is the distance between the projection of points which 
are nearest neighbor in M. p . Then, the maximization of (p~7j) should reveal directions in which the 
projection best preserves the first order neighborhood structure (see Figure [T]). In this sense, the 
index (|I7I) can be seen as a first order approximation of the index proposed in [6] . Thanks to this 
approximation, the maximization step benefits from an explicit solution: The resulting principal 
direction a k is the eigenvector associated to the maximum eigenvalue of (V^l-J^Vfe-i where 

n n 
»=1 3=1 

is proportional to the local covariance matrix. (V^L-l) -1 should be read as the generalized inverse 
of the singular matrix V£_ x . Indeed, since R k ~ 1 is orthogonal to {a 1 , . . . , a k ~ 1 } from |(SJ), V£_ x 
is, at most, of rank p — k + f . Note that this approach is equivalent to Lebart 's one when the 
contiguity matrix M is symmetric. 

5 Illustration on real and simulated data 

Our first illustration is done on the " DistortedSShape" simulated dataset introduced in [3D] , para- 
graph 5.2.1 and available on-line at the following address: http : //www. iro .umontreal . ca/~kegl/research/pcurves 
The dataset consists of 100 data points in M 2 and located around a one-dimensional curve (solid 
line on Figure [2j . The bold dashed curve is the one-dimensional manifold estimated by the prin- 
cipal curves approach [24.. The estimated curve fails to follow the shape of the original curve. 
Using the auto-associative model, the estimated one-dimensional manifold (dashed curve) is closer 
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to the original one. In this experiment, we used one iteration of Algorithm [T] with the contiguity 
index (|17[) in combination with a projection estimate of the regression functions. A basis of N = 4 
cubic splines was used to compute the projection. 

Our second illustration is performed on the " dataset I - Five types of breast cancer" provided to 
us by the organizers of the "Principal Manifolds-2006" workshop. The dataset is available on-line 
at the following address: http://www.ihes.fr/~zinovyev/princmanif2006/ 
It consists of micro- array data containing logarithms of expression levels of p = 17816 genes in 
n = 286 samples. The data is divided into five types of breast cancer (lumA, lumB, normal, errb2 
and basal) plus an unclassified group. Before all, let us note that, since n points are necessarily 
located on a linear subspace of dimension n— 1, the covariance matrix is at most of rank n— 1 = 285. 
Thus, as a preprocessing step, the dimension of the data is reduced to 285 by a classical PCA, 
and this, without any loss of information. Forgetting the labels, i.e. without using the initial 
classification into five types of breast cancer, the information ratio Qd (see Corollary [1]) obtained 
by the classical PCA and the generalized one (basing on auto-associative models), are compared. 
Figure [3] illustrates the behavior of Qd as the dimension d of the model increases. The bold 
curve, corresponding to the auto- associate model, was computed with the contiguity index (fi~7)) in 
combination with a projection estimate of the regression functions. A basis of N = 2 cubic splines 
was used to compute the projection. One can see that the generalized PCA yields far better 
approximation results than the classical one. As an illustration, the one-dimensional manifold is 
superimposed to the dataset on Figure 2] Each class is represented with a different gray level. For 
the sake of the visualization, the dataset as well as the manifold are projected on the principal 
plane. Similarly, the two-dimensional manifold is represented on Figure [5] on the linear space 
spanned by the three first principal axes. Taking into account the labels, it is also possible to 
compute the one-dimensional manifold associated to each type of cancer and to the unclassified 
points, see Figure[SJ Each manifold then represents a kind of skeleton of the corresponding dataset. 

Other illustrations can be found in [5] , Chapter 4, where auto-associative models are applied 
to some image analysis problems. 
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Figure 2: Comparison of one-dimensional estimated manifolds on a simulated dataset. solid line: 
original curve, dashed line: curve estimated from the auto-associative model approach, bold dashed 
line: principal curve estimated by the approach proposed in [24]. 





Figure 4: One-dimensional manifold estimated on a real dataset with the auto-associative models 
approach and projected on the principal plane. 
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Figure 6: Two-dimensional manifold estimated on a real dataset with the auto-associative 
approach and projected on the three first principal axes. 
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